Exponential Smoothing

Lecture 7 · ETS Models

How should recent observations be weighted relative to older ones?

Simple exponential smoothing gives more weight to recent observations.

The SES forecast is a weighted average of all past observations, with weights that decay exponentially:

ŷ_T+1|T = αy_T + α(1−α)y_T−1 + α(1−α)²y_T−2 + …

α is the smoothing parameter, 0 < α ≤ 1.

α near 1: almost all weight on the most recent observation — the forecast tracks the data closely but is noisy.
α near 0: weight is spread broadly across history — the forecast is very smooth but slow to react to change.

α is estimated from the data by minimizing the sum of squared one-step forecast errors.

SES is equivalently expressed as a state-space model with a level component.

Forecast equation:

ŷ_t+h|t = ℓ_t

Level equation:

ℓ_t = αy_t + (1−α)ℓ_t−1

The level ℓ_t is a weighted average of the current observation and the previous level estimate. All future forecasts equal the current level — SES produces a flat forecast.

SES is optimal when the series has no trend and no seasonality — it is equivalent to an ARIMA(0,1,1) model.

How does exponential smoothing handle a trending series?

Holt’s linear trend method adds a trend component to SES.

Forecast equation: ŷ_t+h|t = ℓ_t + hb_t

Level equation: ℓ_t = αy_t + (1−α)(ℓ_t−1 + b_t−1)

Trend equation: b_t = β*(ℓ_t − ℓ_t−1) + (1−β*)b_t−1

Two smoothing parameters: α (level) and β* (trend), both in (0, 1].

Problem: Holt’s method extrapolates the trend indefinitely into the future, which is often unrealistic at long horizons. Trends rarely continue unchanged forever.

The damped trend method prevents over-extrapolation.

A damping parameter φ (0 < φ ≤ 1) multiplies the trend at each step, causing it to shrink toward zero as the horizon increases:

ŷ_t+h|t = ℓ_t + (φ + φ² + … + φ^h)b_t

φ = 1: identical to Holt’s (no damping).
φ near 0: the trend is heavily damped; forecasts quickly flatten.
φ = 0.88–0.98: typical estimated values; the trend fades slowly but persistently.

The damped trend method is one of the most accurate and robust methods across a wide variety of series — it is the recommended default when a trend is present.

How is seasonality incorporated into exponential smoothing?

The Holt-Winters method adds a seasonal component.

Additive Holt-Winters (for constant seasonal amplitude):

ŷ_t+h|t = ℓ_t + hb_t + s_t+h−m(k+1)
ℓ_t = α(y_t − s_t−m) + (1−α)(ℓ_t−1 + b_t−1)
b_t = β*(ℓ_t − ℓ_t−1) + (1−β*)b_t−1
s_t = γ(y_t − ℓ_t−1 − b_t−1) + (1−γ)s_t−m

Three smoothing parameters: α (level), β* (trend), γ (seasonal), all in [0, 1]. m is the seasonal period; k = ⌊(h−1)/m⌋.

Multiplicative seasonality works when the seasonal amplitude grows with the level.

Multiplicative Holt-Winters — the seasonal component multiplies (rather than adds to) the level:

ŷ_t+h|t = (ℓ_t + hb_t) · s_t+h−m(k+1)

The seasonal indices now represent multipliers (e.g., 1.25 for a month 25% above average) rather than additive deviations.

Choosing between them: if seasonal swings are roughly the same size every year, use additive. If they grow proportionally with the series level, use multiplicative (or log-transform and use additive).

What unifies all these methods into a single framework?

ETS stands for Error, Trend, Seasonal.

The ETS framework (Hyndman et al., 2002) expresses every exponential smoothing method as a statistical state-space model with three components, each taking one of several forms:

Error (E): Additive (A) or Multiplicative (M).

Trend (T): None (N), Additive (A), Additive Damped (A_d).

Seasonal (S): None (N), Additive (A), Multiplicative (M).

ETS(A,N,N) = SES · ETS(A,A,N) = Holt’s · ETS(A,A_d,N) = Damped trend · ETS(A,A,A) = Holt-Winters additive.

The ETS notation encodes model structure compactly.

Trend	Error: A Seasonal: N	Error: A Seasonal: A	Error: A Seasonal: M
None (N)	A,N,N — SES	A,N,A	A,N,M
Additive (A)	A,A,N — Holt	A,A,A — HW add.	A,A,M — HW mult.
Damped (A_d)	A,A_d,N — Damped	A,A_d,A	A,A_d,M

Highlighted cells are the most commonly used models. There are 30 possible ETS specifications in total (including M-error variants). ETS(y) in fpp3 selects the best-fitting one automatically via AICc.

ETS selects the best model using AICc.

ETS(y) fits all 30 candidate models and returns the one with the lowest AICc — the information criterion that penalizes complexity for small samples.

You can also constrain the search:

          # Let fpp3 choose automatically

          model(ETS(y))

          # Force a specific model

          model(ETS(y ~ error("A") + trend("Ad") + season("M")))

Important: AICc chooses the best in-sample fit penalized for complexity. Always confirm the selection makes sense with a residual diagnostic (gg_tsresiduals()) and compare out-of-sample accuracy with benchmarks.

The state-space formulation provides exact prediction intervals.

The ETS state-space model has two equations:

Measurement equation: y_t = h(x_t−1) + k(x_t−1)ε_t

State equation: x_t = f(x_t−1) + g(x_t−1)ε_t

x_t is the state vector (level, trend, seasonal). ε_t is i.i.d. with mean zero.

Additive-error models give normally distributed prediction intervals. Multiplicative-error models require simulation (bootstrap) for exact intervals but are often better for series that cannot be negative.

When does ETS outperform regression?

ETS requires no external predictors.

When you don’t have reliable predictors available at the forecast horizon, ETS needs only the history of y_t itself.

ETS adapts to changing levels and trends automatically.

A regression model with a fixed trend line cannot adapt if the growth rate shifts mid-series. ETS updates the level and trend at every period.

ETS excels at short-to-medium horizon forecasts for business series.

Retail sales, energy demand, inventory — series that have clear level, trend, and seasonal patterns but no obvious external drivers.

Use regression (or dynamic regression) when:

You have reliable predictors that improve accuracy beyond what the past series alone can offer.

ETS in fpp3: a complete workflow

Fit and auto-select:

fit <- data |> model(ETS(y))
report(fit) # shows selected model and parameters

Diagnose residuals:

gg_tsresiduals(fit) # ACF, histogram, time plot of residuals

Forecast and plot:

fc <- fit |> forecast(h = 24)
fc |> autoplot(data) # with 80% and 95% PI shading

Evaluate accuracy:

accuracy(fc, test_data) |> select(MASE, RMSE)

Chapter 8 in summary

Exponential smoothing weights recent observations more heavily than old ones.

The smoothing parameters (α, β*, γ, φ) are estimated by minimizing squared errors.

The ETS framework unifies all variants as state-space models.

Error (A or M), Trend (N, A, A_d), Seasonal (N, A, M) — 30 possible combinations.

AICc selects the best ETS model automatically.

Always verify with residual diagnostics and out-of-sample accuracy.

Damped trend is the single best default for trended series.

ETS(A,A_d,N) or ETS(A,A_d,M) depending on seasonality.

Practice Questions

Question 1 of 4

Key Terms

ETS(A,N,N) is identical to ARIMA(0,1,1).

This equivalence shows that exponential smoothing and ARIMA models are not competing philosophies — they are closely related families. Specifically:

SES = ETS(A,N,N) = ARIMA(0,1,1) with θ₁ = α − 1.
Holt’s linear method = ETS(A,A,N) = ARIMA(0,2,2).
Damped trend = ETS(A,A_d,N) = ARIMA(1,1,2).

The key practical difference: ARIMA models can capture more complex autocorrelation structures (including moving average components and long-memory), while ETS focuses on a structured decomposition of level, trend, and seasonality. In practice, run both and compare on test data.